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Abstract 

This work concerns estimation of linear autoregressive models with 
Markov-switching using expectation maximisation (E.M.) algorithm. 
Our method generalise the method introduced by Elliot for general 
hidden Markov models and avoid to use backward recursion. 
Keywords : Maximum likelihood estimation, Expectation-Maximisation 
algorithm, Hidden Markov models, Switching models. 

1 Introduction 

In the present paper we consider an extension of basic (HMM). Let (Xf, Yt) te % 
be the process such that 

1. {Xt) te % is a Markov chain in a finite state space E = {ei, ejv}, 
which can be identified without loss of generality with the simplex of 
M N , where ej are unit vector in M. N , with unity as the ith element and 
zeros elsewhere. 

2. Given (-X"t) teZ > the process {Yt) teZ is a sequence of linear autoregres- 
sive model in K and the distribution of Y n depends only of X n and 
Yn— 1) 5 Y n —p. 

Hence, for a fixed t , the dynamic of the model is : 

Y t+1 = F Xt+1 (Y*_ p+l ) + a Xt+1 e t +i with F Xt+1 G {F ei , F eN } linear 
functions, (Jx t+1 G {a ei , ...,a eN } strictly positive numbers and (e<)teN* a 
i.i.d sequence of Gaussian random variable M (0, 1). 
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Definition 1 Write T% = a {Xq, ■ ■ ■ , Xt}, for the a-field generated by Xq, ■ ■ ■ , Xt, 

yt = a {Yq, ■ ■ ■ , Y t }, for the a-field generated by Yq, ■ ■ ■ , Y t and 

Qt = o {(Xq, Yq) , (X t , Y t )}, for the a-field generated by Xq, ■ ■ ■ , X t and 



The Markov property implies here that P {X t+ \ = ej \F t ) = P (X t+ i = ej \X t ) 
Write Oij = P (X t+1 = e { \X t = ej) and A = (ay) G R NxN and define : 
Vt+\ := Xt+i — E [Xt+i \ J-~t] = X t +i — AX t . With the previous notations, 
we obtain the general equation of the model, for t 6 N : 



The parameters of the model are the transition probabilities of the ma- 
trix A, the coefficients of the linear functions F ei and the variances a ei . A 
successfull method for estimating such model is to compute the maximum 
likelihood estimatoi0 with the E.M. algorithm introduced by Demster , Lair 
and Rubin (1977). Generally, this algorithm demands the calculus of the 
conditional expectation of the hidden states knowing the observations (the 
E.-step), this can be done with the Baum and Welch forward-backward al- 
gorithm (see Baum et al. (1970)). The derivation of the M-step of the E.M. 
algorithm is then immediate since we can compute the optimal parameters 
of the regression functions thanks weighted linear regression. 

However we show here that we can also embed these two steps in only 
one. Namely we can compute, for each step of the E.M. algorithm, directly 
the optimal coefficients of the regression functions as the variances and the 
transition matrix thanks a generalisation of the method introduced by 
Elliott (1994). 

2 Change of measure 

The fundamental technique employed throughout this paper is the discrete 
time change of measure. Write a the vector (a ei , a eN ), </>(.) for the density 
of A/"(0, 1) and (., .) the inner product in R . 

We wish to introduce a new probability measure P, using a density A, so 
that = A and under P the random variables yt are J\f (0, 1) i.i.d. random 
variables. 

1 This likelihood is computed conditionally to the first "p" observations. 



Yq,--- 



Y t . 



X t+1 = AX t + V t+ i 
Y t+ i = F Xt+1 (Y t t p+1 ) + a Xt+1 e t+ i 



(1) 
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Define 



,IGN*, with A = 1 and A t = ]J A/ 



z=i 



and construct a new probability measure P by setting the restriction of 
the Radon-Nikodym derivative to Qt equal to A*. Then the following lemma 
is a straightforward adaptation of lemma 4.1 of Elliot (1994) (see annexe). 

Lemma 1 Under P the Y t are M (0,1) i.i.d. random variables. 

Conversely, suppose we start with a probability measure P such that 
under P 

1. (Xt) teN is a Markov chain with transition matrix A. 

2. (i*) teN is a sequence of Af (0, 1) i.i.d. random variable. 

We construct a new probability measure P such that under P we have 

Y t+ i = Fx t (yi-p) + °~x t £ t+i- To construct P from P, we introduce 

Az := (A/) -1 and A t := (At)^ 1 and we define P by putting (^) |g t = A t , 

Definition 2 Zei {H t ), t G N 6e a sequence adapted to (Qt), We shall write : 



The proof of the following theorem is a detained adaption of the proof of 
theorem 5.3 of Elliott (1994) (see annexe). 

Theorem 1 Suppose Ht is a scalar Q -adapted process of the form : Hq 
is JP" measurable, H t+1 = H t + a t+ i + {Pt+i,V t+ i) + 6 t+ if (Y t+1 ), k > 0, 
where Vt+i = X t+ \ — AX t , f is a scalar valued function and a, (5, 5 are Q 
predictable process ((3 will be N -dimensional vector process). Then : 

7m (Ht+iXt+i) := 7t+i,t+i [Ht+i) 





[a, ei) 4> (Y t+1 ) 




where a\ := Aei, af is the transpose of ai and diag (a«) is the matrix with 
vector dj for diagonal and zeros elsewhere. 
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We will now consider special cases of processes H. In all cases, we will calcu- 
late the quantity jt,t (Ht) and deduce jt (Ht) by summing the components 
of jt,t (Ht)- Then, we deduce from the conditional Bayes' theorem the con- 
ditional expectation of Ht : 
H t :=E[H t \y t ] = *ffi. 

3 Application to the Expectation (E.-step) of the 
E.M. algorithm 

We will use the previous theorem in order to compute conditional quantities 
needed by the E.M. algorithm. 



Let J^ s = (Xi-i, e r) (Xi, e s ) be the number of jump from state e r to 
l=i 

state e s at time t, we obtain : 

7t+i,t+i {Jt+i) = £2=i {itAJn^iXt+i))^ ( ) 

+ ( lt (X t ),T r (Y t+1 ))a sr e s . K) 

Write now 0{ = En~t=i (X n ,e r ) for the number of times, up to t, that X 
occupies the state e r . We obtain 

7m,m (O r t+1 ) = Eili (itAO r t)^(Y t+l ))a t 

+ ( lt (X t ),T r (Y t+1 ))a r . {) 

For the regression functions, the M-Step of the E.M. algorithm is achieved 
by finding the parameters minimising the weighted sum of squares : 

n 

^2 7i (*) { yt ~ ( a o + a iy*-i h i- a py*-p) 2 ) 

t=\ 

where 7, (t) is the conditional expectation of the hidden at time t knowing 
the observations ■ • • , y n . 

Write ip (t) = (1, yt-i, Vt-p) and 9i = (oq, ...,a l p ), suppose that the 
matrix [Et=i 7i (*) * s invertible. The estimator 6>i(n) of Q. L is 

given by : 



Biin) = 



*=i 



Hence, in order to compute Oi(n), we need to estimate the conditional 
expectation of the following processes : 
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1. 

t+1 

TA r t+1 (j) = Y,(Xi,er)Y l _ J Y l+1 
1=1 

for —1 < j < p and 1 < r < N. 

2. 

t+l 

TB r t+1 (i,j) = ( x h e r ) Yi-jYi-i 
i=i 

for < j, i < p and 1 < r < N. 

3. 

t+l 

TC r t+1 = ^2(X h e r )Y l+1 . 
i=i 

4. 

t+l 

TV r t+1 (j)=J2(Xi,e r )Y^ 
1=1 

for < j < p and 1 < r < N. 

Applying theorem © with H t+1 (J) = TA r t+1 (j), H = 0, a t+ i = 0, (3 t+ i = 
0, 5 t +i = {X t , e r ) Y t -j and f(Y t+1 ) = Y t+ %, if j / -1 or S t+ i = {X t , e r ) and 
f(Y t +i) = Y t 2 +1 if j = -I, gives us 

7t+w (TA r t+l (j))= Eli (^t(^ r t(j))^(Y t+1 )) ai () 

+ ( 7t (x t ),F , (n +1 ))y t _ i y t+1 a r , {1 

where a r is the r-th column of A. 
Then, applying theorem ([2]) with 

H t +i(j) = TB r t+1 (i,j), H = 0, at+i = 0, A+i = , 5 t+ i = (X t ,e r ) Y t -jY t -i 
and f(Y t+ i) = 1 gives : 

lt+w {TB r t+1 (i,j))= E£i (itA^m^iYt+i))^ (6) 

+ { lt {X t ),Y r {Y t+1 ))Y t ^Y t ^a r . y 1 

Next, applying theorem ([2]) with 
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H t +i = TC r t+1 , H = 0, a t +i = 0, f3 t +i = 0, 6 t +i = (X t ,e r ) and 
f{Y t+ i) = Y t+ i gives : 

lt+w {TC t+l )= E£i <7t, t (^(j)),P(y m )>a, 

+ MX t ),T r (Y t+1 ))Y t+1 a r . {) 

Finally, applying theorem ([2]) with 

H t+1 (j) = TV r t+1 (j), H = 0, at+ i = 0, /3 t+1 = , 5 t+1 = (X t ,e r )y_, 
and = 1 gives : 

lt+w (Tvi +1 (j))= Eili (7*,t CrvKj)) , p(y t+ i)} a* 

+ ( 7t (x t ),r(y t+1 ))y t _ J a r . 1 ; 

The "Maximisation" pass of the E.M. algorithm is now achieved by up- 
dating the parameters in the following way. 

Parameters of the transition matrix The parameter of the transition 
matrix will be updates with the formula : 



rsr\ 



«. = (9) 

IT (0* T ) y ' 

Parameters of the regression functions For 1 < r < N, let 

R r := ( R^a ) be the symmetric with 

^ 'i<*,j<p+i 

R r 11 = 1, R[j = R r n = TV r (j), Rij = TB r {i -l,j- 1) and 

C r = (TC r , (TA r (i))o<i< P ) we can then compute the updated parameter 6 r 

of the regression function F er with the formula : 

§ r = ( J R r )- 1 C r (10) 

Parameters of the variances Finally, thanks the previous conditional 
expectations, we can directly calculate the parameters a%, <7/v, since for 
1 < r < N the conditional expectation of the mean square error of the rth 
model is 

a 2 r = i- (f A r (-1) + §JR% - 2<?7c r ) . (11) 
This complete the M-step of the E.M. algorithm. 
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4 conclusion 

Using the discrete Girsanov measure transform, we propose an new way to 
apply the E.M. algorithm in the case of Markov-switching linear autoregres- 
sions. 

Note that, contrary to the Baum and Welch algorithm, we don't use 
backward recurrence, altought the cost of calculus slighty increase since the 
number of operations is multiplicated by y, where N is the number of 
hidden state of the Markov chain. 
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Annexe 

Proof of lemma 1 

Lemma 2 Under P the Y t are M (0,1) i.i.d. random variables. 

Proof The proof is based on the conditionnal Bayes 'Theorem, it is a simple 
rewriting of the Proof of Elliot , hence we have 

P(Y t+1 <r\g t ) = E[i {Yt+1 < T} \g t ] 

Thanks the conditionnal Bayes' Theorem we have : 

E [l{y t+1 <r} \Gt] 

_ E [A m l { y f+1 < r} \Qt] 
E[A t+ i\Gt] 
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At y E[x t+1 i {Yt+1 < T} \g t ] 
A t E[\ t+1 \g t ] 



Now 



E 



[h+i \Gt] = 



—oo 



CO 



(q,X t )0(y m ) 

H £ t+i) 



x (f>(et+i)ds t +i 



/ (a, X t ) 4>{F Xt (Yl p+1 ) + (a, X t ) x e t+ i)tfc t+ i = 1 



J —oo 



and since £t+i 



= Y t+1 -F Xt (Y t t_ p+1 ) _ 



POm<T|<? t ) = £?[A t+ il { y (+1 < T} |ft] 

= Ho X Wi^} >< ^t+Oefem 

= f_oo m + i)dyt + i = P (Y t+ i < t) 



Proof of Theorem 2 

Theorem 2 Suppose H t is a scalar Q -adapted process of the form : Hq 
is T, measurable, H t+1 = H t + a t+ \ + (Pt+i,V t+ i) + <5*+i/ (lt+i), k > 0, 
where Vt+i = X t+ \ — AX t , f is a scalar valued function and a, (3, 5 are Q 
predictable process ((5 will be N -dimensional vector process). Then : 

7t+i (Ht+iXt+i) := 7t+i,t+i (Ht+i) 




where ai := Ae\, af is the transpose of ai and diag (ai) is the matrix with 
vector ai for diagonal and zeros elsewhere. 



Proof Here again it is only a rewriting of the proof of Elliot. 
We begin with the two folowwing results : 



Result 1 



e [v t+1 \y t+1 ] =e[e [v t+1 \g t ,y t +i} \y t +i] 
= E[E[v t+1 \g t \\y t+1 ] =o. 



(13) 
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Result 2 

X t+l Xj +l = AX t {AX t ) T + AX t V t T +1 + V t+1 (AX t ) T + V t+l V? +l . 
Since X t is of the form (0, • • • , 0, 1, 0, • ■ ■ , 0) we have 

X t+ iXj +l = diag(X t+ i) = diag(AX t ) + diag{V t+ i) 

so 

V t+X V? +l = diag(AX t )+diag(V t+1 )-A diag{X t ) A T ' - AX t V? +l -V t+l (AX t ) T . 

Finaly we obtain the result 

(V t+l ) := E[V t+l V t T +1 \F t ] 

= E[V t+1 V t T +1 \X t ] (14) 
= diag(AX t ) - A diag{X t ) A T . 

Main proff We have 
lt+i,t+i(Ht+i) = E [A.t + iHt + iX t+ i \yt+i] 

= E [{AX t + V t+l ) (H t + a t+1 + < Pt +1 ,V t+ i > +5 t+1 f(y t+1 )) x A t+1 \y t+1 ] 
Thanks equation (fl~3|) . 

7 t+ i it+ i(F t+ i) = E [{{H t + a t+1 + S t+ if{y t +i)) AX t + < A+i, V t+ i >) x A t+1 \y t+1 ) • 
so : 

TV 

(flt+i) = {E [{{H t + a t+1 + <5t+i/(|ft+i)) < AX t , e, > ej) % |^+i] } 

+£[< > xA m |^ t+ i] , 

hence 

AT TV 

7 t+1)t+ i(fl t+1 ) = i(( H * + a * +1 + 5 t+if{yt+i)) < X t , e, >) h+iajiej \y t+ i]} 

j=i i=i 

+E [< fauVt+i > xAt+i \y t +i) ■ 
we have noted a\ = Aei, so 

TV 

7t+i,t+i 

[< (3 t +i,V t+1 > xA m l^t+i] • 



Since for an adapted process H t to the sigma-algebra Qt 

N 

E [A t+1 H t \y t+1 ] = £ (jt(H t X t ),T i (y t+1 )) 

i=l 

So, for all e r € E 

E [A t+1 H t < X t ,e r > \y t+1 ] = E£i (lt(H t X t < X t ,e r >),P( m+1 )> 

= E£i <7 t (M^ T e,),P(y t+1 )> 

But we have also : 

N 

lt {H t X t Xf) = (lt(HtX t ), a) e ie f, 
i=i 

So we have : 

N 

E [A t+1 H t < X t ,er > \y t+1 ] = £ ( lt (H t X t Xf e r ) X {vt+i)) = (jt(H t X t ),T r (y, 

i=l 

Since a, (3, 5 are Q predictible and f(yt+i) mesurable with respect to 3^+i> 
the result (|14p yield us the conclusion ■ 
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